Analyzing Margins in Boosting

نویسندگان

  • Lev Reyzin
  • Robert Schapire
چکیده

While the success of boosting or voting methods has been evident from experimental data [11], questions about why boosting does not overfit on training data remain. One idea about the effectiveness of boosting was given by Schapire et al. in which they presented an explanation of why the test error of a boosting classifier does not increase with its size by looking at margins. They showed that boosting increases the margins of training examples effectively and that this increase in margins is related to the auspicious performance of AdaBoost [12]. Breiman invented ArcGv, an algorithm that increases margins even more aggressively than AdaBoost, yet does not perform better. He claimed this contradicts Schapire’s margins explanation, since all else being equal a higher margins distribution did not in this case indicate better results [1]. In this paper, we experimentally examine Breiman’s results by comparing the margins and performance of AdaBoost and ArcGv. We also analyze our approaches and show that our results partially contradict Breiman’s findings. We also explore some other voting algorithms’ performance and present some studies only indirectly related to this problem. We also present possibilities for further research and experimentation in this area, which it is our intention to continue pursuing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A more robust boosting algorithm

We present a new boosting algorithm, motivated by the large margins theory for boosting. We give experimental evidence that the new algorithm is significantly more robust against label noise than existing boosting algorithm.

متن کامل

Robust Boosting via Convex Optimization: Theory and Applications

In this work we consider statistical learning problems. A learning machine aims to extract information from a set of training examples such that it is able to predict the associated label on unseen examples. We consider the case where the resulting classification or regression rule is a combination of simple rules – also called base hypotheses. The so-called boosting algorithms iteratively find...

متن کامل

Scaling Boosting by Margin-Based Inclusionof Features and Relations

Boosting is well known to increase the accuracy of propositional and multi-relational classification learners. However, the base learner’s efficiency vitally determines boosting’s efficiency since the complexity of the underlying learner is amplified by iterated calls of the learner in the boosting framework. The idea of restricting the learner to smaller feature subsets in order to increase ef...

متن کامل

Boosting Based on a Smooth Margin

We study two boosting algorithms, Coordinate Ascent Boosting and Approximate Coordinate Ascent Boosting, which are explicitly designed to produce maximum margins. To derive these algorithms, we introduce a smooth approximation of the margin that one can maximize in order to produce a maximum margin classifier. Our first algorithm is simply coordinate ascent on this function, involving a line se...

متن کامل

Boosting in the Limit: Maximizing the Margin of Learned Ensembles

The “minimum margin” of an ensemble classifier on a given training set is, roughly speaking, the smallest vote it gives to any correct training label. Recent work has shown that the Adaboost algorithm is particularly effective at producing ensembles with large minimum margins, and theory suggests that this may account for its success at reducing generalization error. We note, however, that the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005